Discrepancies Detection in Arabic and English Documents
نویسندگان
چکیده
In the paper, there are analyzed and compared results of usable methods for discrepancies detection based on character n-gram profiles (the set of character n-gram normalized frequencies of a text) for English and Arabic documents. English and Arabic texts were analyzed from many statistical characteristics point of view. We covered some statistical differences between both languages and we applied some heuristics for measurements of text parts dissimilarities. The results for each text can call for an attention to the text (or not) if the text parts were written by the same author. We evaluate some Arabic and English documents and show its parts they contain discrepancies and they need some following analysis for plagiarism detection. The analysis depends on selected parameters prepared in experiments.
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملUltra High Speed Approach for Document Skew Detection and Correction Based On Centre of Gravity
Skew detection and correction (SDC) has a direct effect in efficiency and exactitude of documents’ segmentation and analysis and thus is considered as a very important step in documents’ analysis field. Skew is a major problem in documents’ analysis for every language. For Arabic/Persian document scripts this problem is more severe because of special features of these languages. In this paper a...
متن کاملPersian/Arabic Document Segmentation Based On Pyramidal Image Structure
Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus...
متن کاملThe Reality of Arabic Fiction Translation into English: A Sociological Approach
English translations of texts associated with Arabic fiction remain largely unexplored from a sociological perspective. Drawing on Pierre Bourdieu’s sociology, this paper aims to examine the genesis of Arabic fiction translation into English as a socially situated activity. Works of Arabic fiction emerged in English translation in the early twentieth century. Since then, this intellectual field...
متن کامل